-
Notifications
You must be signed in to change notification settings - Fork 42
WIP Adding new resolver creators/datacite.yml #3510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #3510 +/- ##
==========================================
+ Coverage 48.41% 48.86% +0.44%
==========================================
Files 593 605 +12
Lines 42150 43081 +931
Branches 1388 1444 +56
==========================================
+ Hits 20409 21052 +643
- Misses 21572 21858 +286
- Partials 169 171 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…es ui/queries/schema and resolvers to match workflow
…utors, name/orcid/link to ON account
…utor component file name
nellh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a deeper look at the specification, I found a few issues with how we've approached this. The implementation here looks good but I think there's a few changes required to align with the current datacite metadata schema.
https://datacite-metadata-schema.readthedocs.io/_/downloads/en/4.6/pdf/
| ) | ||
| } | ||
| if (authors.length) { | ||
| if (authors.length) { // TODO - NELL - this was switched to contributors - is that correct? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we want to match contributors once that's available.
| const dataciteData = await dataciteCache.get(() => | ||
| getDataciteYml(datasetId, revision) | ||
| ) | ||
| if (dataciteData && Array.isArray(dataciteData.authors)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like our example here was actually not very representative of a correct implementation.
Digging deeper into 4.6, the authors data is actually part of creators:
https://schema.datacite.org/meta/kernel-4.6/
Here's a better example of structured author metadata.
creators:
- creatorName:
nameType: Personal
value: Smith, John
givenName: John
familyName: Smith
nameIdentifiers:
- nameIdentifierScheme: ORCID
schemeUri: https://orcid.org
nameIdentifier: https://orcid.org/0000-0002-1825-0097There is also a contributor field that is meant for non-author roles.
| if (!parsedContributors || parsedContributors.length === 0) { | ||
| const descriptionJsonCache = new CacheItem( | ||
| redis, | ||
| CacheType.datasetDescription, | ||
| [datasetId, revisionShort], | ||
| ) | ||
| try { | ||
| const datasetDescriptionJson = await descriptionJsonCache.get(() => | ||
| getDescriptionObject(datasetId, revision).then( | ||
| (uncachedDescription) => ({ | ||
| id: revision, | ||
| ...uncachedDescription, | ||
| }), | ||
| ) | ||
| ) | ||
| if ( | ||
| datasetDescriptionJson && | ||
| Array.isArray(datasetDescriptionJson.Authors) | ||
| ) { | ||
| parsedContributors = normalizeBidsAuthors( | ||
| datasetDescriptionJson.Authors, | ||
| ) | ||
| Sentry.captureMessage( | ||
| `Loaded contributors from dataset_description.json for ${datasetId}:${revision}`, | ||
| ) | ||
| } | ||
| } catch (error) { | ||
| Sentry.captureException(error) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better if it calls the description resolver that exists to avoid reimplementing parts of it.
| # Single list of files to download this snapshot (only available on snapshots) | ||
| downloadFiles: [DatasetFile] | ||
| # Authors list from datacite.yml || dataset_description.json | ||
| contributors: [Contributor] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this, since datacite.yml makes the distinction between creators and contributors, we might want to call this creator so we can use contributors later (for datacite.yml contributor fields) instead of naming the creators field as contributors in our API schema. That is likely to be confusing if we want to use both fields and it looks like the correct way of adding things like curation contributors is the optional contributors field over adding them as authors (creators in datacite.yml).
| /** | ||
| * Attempts to read and parse datacite.yml. | ||
| */ | ||
| const getDataciteYml = async ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more error case to consider is this:
resourceType:
resourceTypeGeneral: JournalArticleAnything other than dataset in this field means that it is probably an unrelated datacite.yml. It's not that unlikely to annotated the software or article like this, so we should check that resourceTypeGeneral is Dataset.
…o use creators, uses new datacite schema (4.6) for parsing and also checks that resourceTypeGeneral is Dataset before parsing, otherwise fallback uses description resolver.
…ng - adding datasite.yml/.yaml to Git Attributes patterns to avoid annexing
… - needs hook up with #3510(or future update) and datacite.yml
refactor of how contributor/(author) information is sourced, processed, and displayed across the application by prioritizing datacite.yml,
Notes